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We study semiparametric varying-coefRcient partially linear mod- 
els when some linear covariates are not observed, but ancillary vari- 
ables are available. Semiparametric profile least-square based esti- 
mation procedures are developed for parametric and nonparametric 
components after we calibrate the error-prone covariates. Asymptotic 
properties of the proposed estimators are established. We also pro- 
pose the profile least-square based ratio test and Wald test to identify 
significant parametric and nonparametric components. To improve 
accuracy of the proposed tests for small or moderate sample sizes, a 
wild bootstrap version is also proposed to calculate the critical val- 
ues. Intensive simulation experiments are conducted to illustrate the 
proposed approaches. 

1. Introduction. Various efforts have been made to balance the interpre- 
tation of linear models and flexibility of nonparametric models. Important 
results from these efforts include semiparametric varying-coefficient partially 
linear models (SVCPLM), in which the response variable Y depends on vari- 
ables Z, X and U in the form of 
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y = 0'^Z + a'^(C/)X + e, 
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where is a p-dimensional vector of unknown parameters, Q!(-) is a g-variate 
vector of unknown functions, U is a vector of nonparametric components 
that may be multivariate and the model error e has mean zero and finite 
variance. For notational simplicity, we assume that U is scalar. q;^(C/)X is 
referred to as a nonparametric component since a{U) is nonparametric. 

Model (1.1) permits the interaction between the covariates U and X in 
such a way that a different level of covariate U is associated with a different 
linear model about ©"""Z, and allows one to examine the extent to which 
covariates X interact. This model presents a novel and general structure, 
which indeed covers many well-studied, important semiparametric regres- 
sion models. For example, when Z = 0, (1.1) reduces to varying-coefficient 
models, which were originally proposed by Hastie and Tibshirani (1993) and 
studied by Fan and Zhang (1999), Xia and Li (1999) and Cai, Fan and Li 
(2000). When q = 1 and X = 1, (1.1) reduces to well-known partially lin- 
ear models, in which Y depends on Z in a linear way but is related to 
another independent variable U in an unspecified form. There is a great 
deal of literature on the study of partially linear models [e.g., Engle et al. 
(1986), Robinson (1988) and Speckman (1988)]. A survey of partially linear 
models was given by Hardle, Liang and Gao (2000). The study of SVCPLM 
has been investigated by Zhang, Lee and Song (2002) and Fan and Huang 
(2005), among others. Zhang, Lee and Song (2002) developed the proce- 
dures for estimation of the linear and nonparametric parts of the SVCPLM. 
Fan and Huang (2005) proposed a profile likelihood technique for estimating 
parametric components and established the asymptotic normality of their 
proposed estimator. 

All studies of the SVCPLM are limited to considerations of exactly ob- 
served data. However, in biomedical research observations are measured with 
error. Simply ignoring measurement errors, known as the naive method, will 
result in biased estimators. Various attempts have been made to correct for 
such bias, see Fuller (1987) and Carroll et al. (2006) for extensive discussions 
and examples of linear and nonlinear models with measurement errors. In 
this paper, we are concerned with the situation where some components (^) 
of Z are unobserved directly, but auxiliary information is available to remit 

Let Z = (^'^, W"'")'^, where ^ is a pi x 1 vector and W is a vector of the 
remaining observed components. We assume that ^ is related to observed rj 
and V through the relationship ^ = £'(r/|V). Thus, we study the following 
model: 

U = ^(V)+e, 

where S(e|Z, X, [/) = 0, E{e^\Z,X,U) = a^{Z,X,U) and e is an error with 
mean zero and positive finite covariance matrix = E{ee^). The four co- 
variates V, W, X and U are different. In our structure, we allow that V and 
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(X,W, [/) may overlap. Model (1.2) is flexible enough to include a variety 
of models of interest. We give three examples to illustrate its flexibility: 

Example 1 (Errors-in- variable models with validation data). Z is a 
p-variate variable vector and is not observed. Z is an another p-variate vec- 
tor and is observed associated with vector Z. Assume that we have primary 
observations {Yj,Zj,Uj,j = 1, . . . ,n}, and uq independent validation obser- 
vations {Zj, Zij,Ui, i = n + 1, . . . , n + no}, which are independent of the pri- 
mary observations. Let V = {Zi^,U)'^. The partial errors-in- variable model 
with validation data is written as 

Y = (3^ E {Z\Y) + a{U) + e, 
.e = e + /3T{Z-i?(Z|V)}. 



(1.3) 



This model has been studied by Sepanski and Lee (1995), Sepanski and Carroll 
(1993) and Sepanski, Knickerbocker and Carroll (1994). Taking X = 1, 6 = 
0, 7] = Z and ^ = E{Z\Y) in (1.2), we know that (1.3) is a sub-model of 
(1.2). 

Example 2 (De-noise linear model). The relation between the response 
variable Y and covariates (^, W) is described by y = fS^^ + O'^W + e, where 
P and 9 are parametric vectors, respectively. The covariate ^ is measured 
with error since, instead of observing ^ directly, we observe its surrogate rj. 
This forms a de-noise linear model: 

where $ = $,{t) is subject to measurement error at time t and the measure- 
ment errors e and e are independent of each other at each time t. 

Cai, Naik and Tsai (2000) used this model to estimate the relationship 
between awareness and television rating points of TV commercials for certain 
products. Cui, He and Zhu (2002) proposed an estimator of the coefficients 
and established asymptotic results of the proposed estimator. It is easy to 
see that (1.2) includes (1.4). 

Example 3 (Rational expectation model). Consider the following ra- 
tional expectation model: 

(1-5) Yt = 7^Si + e{vt - Eir,^\Vt)} + St, 

where rj^ — E{ri^\'Vt) is the expectation payoff for price variable ij^ given 
historical information Vj. In this model, {Yt,St,r]^,Vt) except E{r]^\'Vt) can 
be observed directly. 
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Besides estimation and inference of 7 and within the econometric com- 
munity, the following model is of interest: 

(1-6) Yt = 7^St + C^r,, - 0^E{r^t\\t) + Sf 

It is worthy to note that (1.6) is a sub-model of (1.2). An interesting question 
is to test whether the (1.6) satisfies the rational expectation model (1.5), that 
is, to test following hypothesis: 

(1.7) Ho:P = C VS Hi:(3^C 

In the econometric literature, the regression of unobserved covariates is 
also called generated regression. This topic has been widely studied. Pagan 
(1984) gave a comprehensive review on the estimation of parametric models 
with generated regression. Ai and Mcfadden (1997) presented a procedure 
for analyzing a partially specified nonlinear regression model in which the 
nuisance parameter is an unrestricted function of a subset of regressors. 
Ahn and Powell (1993) and Powell (1987) considered the case with the gen- 
erated regressors in the nonparametric part of the model. Li (2002) consid- 
ered the problems of estimating a semiparametric partially linear model for 
dependent data with generated regressors. Their models are special cases of 
the rational expectation model. 

Various procedures similar to generated regression have been proposed 
to reduce the bias due to mismeasurement. Regression calibration and sim- 
ulation extrapolation have been developed for measurement errors models 
Carroll et al. (2006). Liang, Hardle and Carroll (1999) studied a special case 
of (1.2), partially linear errors-in-variables models, and proposed an atten- 
uated estimator of the parameter based on the semiparametric likelihood 
estimate. Wang and Pepe (2000) used a pseudo-expected estimating equa- 
tion method to estimate the parameter in order to correct the estimation 
bias. 

In an attempt to develop a unified estimation procedure for (1.2), we 
propose a profile-based procedure, which is similar to regression calibration 
method in spirit. The procedure consists of two steps. In the first step, we cal- 
ibrate the error-prone covariate ^ by using ancillary information and apply- 
ing nonparametric regression techniques. In the second step, we use profile 
least-square-based principle for estimating the parametric and nonparamet- 
ric components. Under the mild assumptions, we derive the asymptotic rep- 
resentives of the proposed estimators, and use the representives to establish 
asymptotic normality. We also propose the profile least-square-based ratio 
test and Wald test for the parametric part of (1.2), and a goodness-of-fit test 
for the varying coefficients in the nonparametric part. The asymptotic dis- 
tribution of the proposed test statistics are derived. Wild bootstrap versions 
are introduced to calculate the critical values for those tests. 



SEMIPARAMETRIC PARTIALLY LINEAR MODELS 



5 



The paper is organized as follows: In Section 2, we focus on the estimation 
of the parameters and nonparametric functions, and on the development of 
asymptotic properties of the resulting estimators. The error-prone covari- 
ates are first calibrated. Bandwidth selection strategy is also discussed. In 
Section 3, we develop profile least-square-based ratio tests for parametric 
and nonparametric components. Wild bootstrap methods are proposed to 
calculate the critical values. The results of applications to simulated and 
real data are reported in Section 4. Section 5 gives a conclusion. Regularity 
assumptions and technical proofs are relegated to the Appendix. 

2. Estimation of the parametric and nonparametric components. When 
^ is observed, estimators of (3 and a{u) and associated tests have been devel- 
oped to study (1.2). These estimators and tests cannot be used directly when 
^ is unobservable. We first need to calibrate ^ by using ancillary variables 
rj and V because a direct replacement of ^ by 77 will result in bias. 



2.1. Covariate calibration. For notational simplicity, we assume V is uni- 
variate in the remainder of this paper. Let r^j be the A;th entry of vector t], 
and Lfe(-) = L{-/b) /b, b = b}^ (A; = 1, 2, . . . ,pi) is a bandwidth for the A;th com- 
ponent of r}. Assume throughout the paper that ik{v) has r + 1 derivatives 
and we approximate ik{v) by an r-order polynomial within the neighbor- 
hood of vq via Taylor expansion 



ik{v) ^ S,k{vo) + Ck(.vo){v -vo) + 



+ 



r\ 



{v - voY = aj^k{v - voY 



Denote 



V„ 



{Vi - v) 

{Vn - V) 



{Vn - VY 



V 



(k) 



I Vlk' 



KVnk, 



= diag{Lb{Vi 



, Lb{Vn — v)}- The local polynomial estimator [Fan 



and Gijble (1996)] of (09,^1 «r,A:) can be expressed as a^ = 
(VJVF^,V^,)~^VJ X W^ri^^\ As a consequence, S,k{v) is estimated by £,k{v) = 
C?(VjW„V^)-iVj X W^r]^''\ for k = l,... ,pi, where Ci is a (r + 1) x 1 
vector with 1 in the first position and in other positions. 

In what follows, denote A^"^ = AA^ , fij = J L{u) du, Uj = J L'^{u) du, 
Su = {l-'-j+i)o<j,i<r and Cp = (^r+i, • • ■ , fJ'2r+i)'^ ■ fv{v) is the density function 
of F. 

Under the assumptions given in the Appendix, we can prove [Fan and Gijbels 
(1996), pages 101-103 or Carroll et al. (1997), page 486] that 



i{v)-i{v) 



^(r+l) 



(r + 1)! 



{v) + 



nfv{v) 
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(2.1) 



uniformly on v gV. This fact will be used for proving the main results in 
the Appendix. 

2.2. Estimation of the parametric component. Let (1^, r/j, Vj, Wj, Xj, C/j), 
z = 1, 2, . . . , n, be the observations from (1.2). The unknown covariates are 
substituted by their estimators given in the above section. We therefore have 
fohowing "new" model: 



(2.2) 



1, 



,n, 



where {ei}'^^^ are still treated as errors. If would be an unbiased estimator 
of ^j, then Esi = 0. 

Approximate aj(U) within the neighbors of u by aj{u) + bj{u)(U — u) for 

i = 1, . . . , g. Write % = {^J, Wj)^ and = (/3'^, O^f. Following the profile 
likelihood-based procedure proposed by Fan and Huang (2005), our profile 
least-square-based estimator of is defined as 

(2.3) 0„ = {ZTZ}"1ZT(I-S)Y, 

where Z = (I — S)Z, I is the n x n identity matrix, 

/(XT 0T)(dT W,,D„J-1dTw„, 



V(XT 0J)(DT^W„„D„J-1dT^W,„. 
/X.J h-\U,-u)X.J\ 



VXT h-\Un-u)X.l) 



nx2q 



U)}r 



and Y = {Yi, . . . , Y^)^ , W„ = diag{i^;,(?7i - n), . . . , K^iUn 
(Zi, . . . , Zji)'^ , Oq is the q x 1 vector with all the entries being zero, K{-) is 
a kernel function, /i is a bandwidth and Kh(-) = K{-/h)/h. 

We now give a representation of 0„. This representation can be used 
to obtain the asymptotic distribution of ^/n{@n — 0), which we give in 
Theorem 2. This result extends the method of Fan and Huang (2005) to a 
SVCPLM with generated regressors. 

Let ^{U) = E{X.Z^\U), r(C/) = S(XXT|C/), V(Z, X, [/) = Z - «>T(f/) X 
r-i(?7)X, B{W) = E[{Z-^^{U)T-^{U)X}\V] and E = eIzZ^) - E{^^ {U) x 
T-\U)^{U)}. 
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Theorem 1. Under Assumptions 1-5 in the Appendix, we have 

0„ - = S 

+ - 5] A(Vj)eJ/3o + - 5] ^(Z„ X„ 

n ■' n ^ 

j=i 1=1 J 

x{l + op(l)}, 

where A(V,) = i ELi V'(Zi, X,, ?70Lb(V,- - V,)/A,(Vi). 

Theorem 2. Lei nh"^^"^^^^ 0. Under Assumptions 1-5 in the Appendix, 
y/n{@n — 0) converges to a normal distribution with mean zero and covari- 
ance matrix Si, where Si = S"^DS^\ D = S[cr2(X, Z, C/){V'(X, Z, C/)}®^] _^ 
S[(eT/3)2{5(V)}®2] + /3T^{£;(ee|Z, X, U, V){B{V)}^^}. 

Furthermore, if e is independent of e given (Z,X, [/, V), and e is indepen- 
dent of {Z,X.,U), the asymptotic covariance can be simplified as S~"'^(ct^S + 
i?[(e'^/3)2{i?(V)}'^2])S~^ . If e is also independent o/V, the asymptotic co- 
variance can further be simplified as a^S-^ + /3'^Se/3S~i^{5(V)}®2s-i. 

The proof of Theorem 2 can be completed by using Theorem 1. We omit 
the details. 

The asymptotic variance has a similar structure to that of Das (2005). 
The first term of asymptotic variance can be viewed as the variance from the 
first stage estimation without measurement error /missing data, the second 
one is the variance of the second stage for estimating unobserved variables 
and the third one is the covariance of two-stage estimators. If e = in (1.2), 
that is, the covariate can be exactly observed, the variance of 0„ is the 
same as that of Fan and Huang (2005). To achieve the root-n estimator of 
0, Theorem 2 indicates that undersmoothing is required in estimating ^(v) 
and the optimal bandwidth does not satisfy the condition of Theorem 2. 

Example 1 (cont.). Let /3„ be the estimator of /3 in (1.3). Assume 
no/n A. Checking the conditions of Theorem 2, we can conclude that 

^0^-(3o) ^ iV(0, S,), where = ^-\a^ + Xp^ E[E{Z- E{Z\U)\Y}f^ p) 
and S = ^[{^-S(^|C/)}®2]. 

Example 2 (cont.). For the de-noised models introduced in Section 1, 
we apply Theorem 2 to derive the asymptotic distribution of the estimator, 

= given by Cui, He and Zhu (2002), and obtain that ^(0 - 

0)^iV{O,S-i(f72+/3TSe/3)}. 



1 b- 



n {r 



r+l " 

'-)■ i=l 
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The asymptotic covariance of 0„ can be consistently estimated by 5]„ = 
n±-'a' + ±-'Q±-\ where S"' = {{Z^Zr'Z-^ {I-Sf}^\ Q = ^ Er=i(^.- 

e„{B(Vi)}«2, a' = iEr=i{>i - - 0^Z}2, B(v) = Z - 

^{^>'^(C/)r-i(;7)X|V = v} and -E{$'^(C/)r-i(C/)X|V = v} is a nonpara- 
metric regression estimator of <&^(f7)r~-'^([/)X on V. a(-) will be given in 
the next section. 

Generally S„ is difficult to calculate. However, implementation will be- 
come simpler in some cases. For example, in the errors-in-variables model 
with validation data, a direct simplification yields -B(V) = Z — ^'^{U)T^^{U)'X., 
D = {P^^ E{ee^\\')f3}'S and the asymptotic covariance matrix equals S~^{(T^ + 
XP^E{ee^\V)P}. This matrix can be estimated by a standard sandwich 
procedure. The similar situation also applies for the asymptotic covariance 
matrix, S~^{(T^ + E{ee^\Y)(3}, of the de-noise model. 

2.3. Estimation of the nonparametric components. After obtaining esti- 
mates &n, we can estimate aj{u) and bj{u) for j = 1, . . . , g, and then aj{u). 
Write *(n) = {ai(u), . . . ,aq{u),bi{u), . . . ,bq{u)}'^. An estimator of the non- 
parametric components ^{u) is defined as 

(2.4) *(n) = H-^BlWu^ur^-DlWuiY - Z0„). 

Correspondingly, a.{u) is estimated by a(n) = (Ig, Og)(D^W„Du)^^D^Wu 

(Y — Z0„), where Iq is the q x q identity matrix, H = diag(l,/i) (8> Iq- We 
have the following asymptotic representation for the resulting estimator: 

Theorem 3. Under Assumptions 1-5 given in the Appendix, we have 
\/^H{*(no) - *(no)} 



(^+1)! ^^^^ ""py 

+ o(ni/2/i5/2+^i/2/,i/25r+i) 



2(//2 -/Uf) V(^3 



n/«(M)(/i2 -/ii) 

{Ui-u)/h- fii J ^ 

Based on this representation, we can derive the asymptotic normality of 
the proposed nonparametric estimators of the varying coefficient functions. 
The proof is straightforward but tedious. We omit the details. 
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For notational simplicity, we assume that e is independent of (Z,X, [/) 
and e is independent of (V, U) in the remaining part of this paper. 

Theorem 4. Under Assumptions 1-5, we have 



'nh 



1 si h> 



(r + 1)! 



r-Hno)i?[X{^(^'+i)(V)}^/3o|[/ = uo] ^ ^ ^(^2 ^ ^r+i^ 
^iV(0,S2), 

as 00, where S2 = f~\uo){a^T-\uo) + r^Huo)'^*iT^Huo)} ® G, 
r 1 

^ (/^2-/i?)2 

(l^lj + 122)1^1 -//l//2l^0 - f^lJ^2 T^2 - Ail(2j^i +/iil/o) 

= /3TSe/3A(no), A(no) = (ii;[{i?(X|V)}|[/ = u^]r\ qo = fi2/{l^2 - l^i), 

qi = -^J.l/{^^2- lA)- 

Furthermore, if nhb'^^'^'^ ^ 0, then 

V^U(u) - a(u) - —dllJ^a"(u) + o(h^ + b^+')] h iV(0, S^), 
L 2 /i2 - Ail J 

where S*^ = a^iq^^'^o + ^Qoqu^i + qfi^2){T^Huo)+T-Huo)-ElT-^uo)}/ fuiu). 

The first term of 5^2 is the asymptotic covariance of the usual profile 
likelihood estimator of Cai, Fan and Li (2000), when is observed. The 
second term is attributed to calibrating the error-prone covariates. In the 
error-in- variable model with validation data, if X is independent of V and 
E(X.) = 0, the measurement errors have no impact on the effect of the co- 
variance Xlo- Theorem 4 also indicates that if max(/i5/2, 6^+1) ^ 0, the 
bias of a{u) tends to zero and a{u) is asymptotically normally distributed 
with rate (n/i)^/^. 

After obtaining 0.„ and a{u), one can easily give an estimator of the 
variance cr^ of the error e: 

^l=- ^l{y^ - - elw, - a^mx,}\ 
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In our simulation, a simple version of o"^ is used. Note that S depends only on 
the observations {([/j, Xj)}^]^, and we can derive a "synthetic linear model," 
that is, Y — Z0 = M + e, where M = a^{U)'K. A straightforward derivation 
yields (I — S)Y = (I — S)Z0 + (I — S)e. Standard regression gives the least- 
square estimates and then M = S(Y — Z0). Note that Z is not always 
observed. Replacing Z by its estimates, we obtain a consistent estimator M 
of M; that is, M = S(Y — Z0). A consistent estimator o"^ may be defined 

as (T^ = i J2i=i{yi - Zj - Mj)2, where Mj is the ith element of M. 

2.4. Bandwidth selection. The proposed procedure involves two band- 
widths, h and b, to be selected. To derive asymptotic distributions of the 
proposed estimators, we theoretically impose the rates of convergence for the 
bandwidths. It is worthwhile to point out that undersmoothing is necessary 
when we estimate ^ and the optimal bandwidth for b is then violated. 

As mentioned before, the optimal bandwidth for b cannot be obtained 
because undersmoothing the nonparametric estimators of the covariates is 
necessary. The consequence of undersmoothing ^ is that the bias is kept 
small and precludes the optimal bandwidth for b. The asymptotic vari- 
ances of the proposed estimators for constant coefficients depend on nei- 
ther the bandwidth nor the kernel function. Hence, we can use the sim- 
ilar method of mixture of higher-order theoretical expansions, proposed 
by Sepanski, Knicherbocker and Carroll (1994) or the typical curves ap- 
proach by Brookmeyer and Liao (1992) to select the bandwidth b. As done 
by Sepanski, Knickerbocker and Carroll (1994), the suitable bandwidth is 
b = Cn~^^^, where C is a constant depending on unknown function ^{v) 
and its twice derivatives. In practice, one can use a plug-in rule to estimate 
the constant C. A useful and simple candidate C is ay, the sample devi- 
ation of V. This method is fairly effective and easy to implement. In our 
simulation example, the bandwidth is 6 = a^n"^/'^. Based on the asymptotic 
analysis and empirical experience for the fixed time case (i.e., de-noise mod- 
els), we suggest a simple rule of thumb as follows: The smoothing parameter 
b is so chosen that intervals of size 26 would contain around 5 points for n 
up to 100 and between 8~^n^/^ and 4~^n^/^ points for larger n. 

We use the "leave one sample out" method to select the bandwidth h. This 
method has been widely applied in practice; for example, Cai, Fan and Li 
(2000) and Fan and Huang (2005). We define the cross-validation score for 

h as CVih) = n-^J:7=l{y^ - al_iiUi)X^ - C-^Z^}^ where Gn,-i is the 
estimated profile least-square-based estimator defined by (2.3), computed 
from the data with measurements of the ith observation deleted, and afi_i{-) 
is the estimator defined in (2.4) with 0„ replaced by &n.-i- The likelihood 
cross-validation smoothing parameter hcv is the minimizer of CV{h). That 
is, hcv = argmin/i CV{h). 
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3. Tests for parametric and nonparametric components. 

3.1. Test for parametric components. An interesting question is to con- 
sider the following hypothesis: 

(3.1) Ho:A@ = VS HiiAQ^O, 

where A is a given I x p full rank matrix. 

Let ©0 = (/Sq 1 ^0 be the estimators of © and q:o(") be the estimator of 
a(n) under the null hypothesis. Denote RSSq = J2?=i{^i ~ f^o^i ~ ~ 
dg (C^j)Xj}^. RSSq can be further expressed as J27=iV^i ~ ~ ^« ~ 

S(Y -Zeo)}j where ©o = © - (Z^Z)"! AT{A(ZTz)"1 A^}-! A0, and 

= (Z''^Z)^^Z"'^Y, an estimator of © without the restriction, with Z = 

(I-S)Z and Y = (I-S)Y. 

^ p ^ p 

Similar, let ©i = {/3i ,6^ )""" and ai{-) be the estimators of © and a(-) 
under the alternative hypothesis, respectively. Denote RSSi = J2i'=i{^i ~ 

— 6^ Wi — Q:]'"(C/j)Xi}^, which can be expressed as Yll^=i{^i — Pi^i — 

01 Wi - S{Y - Z©i)}2. Following Fan and Huang (2005), we define a profile 
least-square-based ratio test by 

71 

Tn = —{RSSq — RSS i) / RSS i- 

Under their set-up, Fan and Huang (2005) showed that statistic r„ is the 
profile likehhood ratio when the error distribution is normally distributed. 
In the present situation, because of the effect of measurement error on vari- 
ables, no central A^^-distribution similar to that of Fan and Huang (2005) 
is available. However, we can still prove that 2Tn has the asymptotic non- 
central distribution under the alternative hypothesis of (3.1), which we 
summarize in the following theorem. 

Theorem 5. Suppose that Assumptions 1-5 in the Appendix are satis- 
fied and nlP''^^'^ ^0, as n —> oo. Under the alternative hypothesis of (3.1), 

I 

2Tn - na~'^&^A^{AT,^^A^y^A@ ^ ^Wixfi 

i=l 

where oJi for 1 < i < I are the eigenvalues of {a^A'F]^^A^)^^{A'S^^A^) 
and Xii is the central distribution with 1 degree of freedom. Furthermore, 
let Si and 5] be the consistent estimators of 'Si and S, respectively. Then 
2QnTn-^xl)W, where Qn = 1/ tv{{a^ A±'^ A^^y'iAt^^ A^^)}, xJ)(A) is 
the noncentral distribution with I degree of freedom, and the noncentral 
parameter A = cr~^f3 lim„_>oo ?^©'^A'^(A5]~^A'^)~^A© with g = 
//tr{((T2AI]-iAT)-i X (AS^^AT)}. 
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In a similar way, we may construct the Wald test for hypothesis (3.1) as 

Wn = A'^(ASiA'^)^-'^A©, and demonstrate that Wn and 2£>„T„ have 
the same asymptotic distribution under the ahernative hypothesis. These 
properties can therefore be used to calculate the power of the proposed 
tests. 

Example 3 (cont.). Generalize (1.6) to a more flexible model: 

Yt = p'^E{ri\Vt) + C^77 + -f'^St + aiUt)Xt + et. 

Write e = (/3^, C^, 7^)"^ and Z = {E{t]'^\V),t]'^ , Sj}^ . The hypothesis (1.7) 
is equivalent to 

(3.2) Ae = VS Hi-.A&^O, 

where A = (Ip^, — lpj,0), Ip^ is pi-variate vector with all entries 1. This is 
an expression of (3.1). As a consequence, the proposed profile least-square- 
based ratio test and Wald test can be applied to test this hypothesis. 
For hypothesis (3.2), one may also propose a Wald-type statistic: Wn{h) = 

&^A^{AthA^)-'^A&, where = t~\a^ + P^te^). It can be proved 
that 2gnTn and Wn have the same asymptotic X'^ distribution. 

3.2. Tests for the nonparametric part and wild bootstrap version. It is 
also of interest to check whether the varying-coefficient functions a{u) in 
(1.2) are parametric functions. Specifically speaking, we consider the follow- 
ing hypothesis: 

Ho:ai{U) = ai{U,-f) VS Hi:ai{U) ^ ai{U,-f), i = 1, 2, . . . , g, 

where 7 is an unknown vector, ai(-, •) is a known function and i = l,2,...,g. 
For simplicity of presentation, we test the homogeneity: 

HQ:ai{U) = ai, . . . ,ag(C/) = a^. 

Let ai, . . . , and be the profile estimator under Hq. The weighted resid- 
ual sum of squares under Hq is RSS{Hq) = J27=i''^i0^i ~ Yl'j=i^j^ij ~ 

Zj)^, where Wi{-) are weighted functions such that X^iLi = li and 
Wi > 0. In general, the weight function w has a compact support, designed 
to reduce the boundary effects on the test statistics. When (T^(Z,X, [/) = 
i;(Z, X, {7)(7^ for some known function v{Z,X.,U), we may choose Wi = 
v-^{Zi,Xi, Ui). See Fan, Zhang and Zhang (2001) and Fan and Jiang (2007) 
for a similar argument. 

Under the general alternative that all the varying-coefficient functions 
are allowed to be varying of random variable U , we use the local likeli- 
hood method to obtain estimator /3 and a.[U). Therefore, the corresponding 
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RSS{Hi) =J2wi\Yi-J2 a,{Ui)X,j - 



In a similar way to that used in Section 3.1, we propose a generalized likeli- 
hood ratio (GLR) statistic: Tglr = {RSS{Ho) - RSS{Hi)} / RSS{Hi). Un- 
der mild assumptions, one can derive the asymptotic distribution of Tqlr- 
This distribution can be used to gain the empirical level. See 
Fan, Zhang and Zhang (2001) for a related discussion. 

These arguments can be applied to the following partially parametric null 
hypothesis: -ffo : Oii{U) = ai, . . . , ai{U) = ar,r < q. The difference is only the 
definition of RSS{Hq), for which we use the profile likelihood procedure 
to estimate the constant coefficient Oj, i = 1,2, . . . ,r and 0, and use the 
profile linear procedure to estimate the nonparametric component ai{-), i = 
r + 1, . . . ,q under the null hypothesis. 

Although the asymptotic level of Tqlr is available, Tqlr may not per- 
form well when sample sizes are small. For this reason and for practi- 
cal purposes, we suggest using a bootstrap procedure. To be specific, let 

ii = Yi — @ T^i — a^{Ui)'X.i be the residuals based on estimators (2.3) and 
(2.4) for parametric and nonparametric parts, respectively. We use the Wild 
bootstrap [Wu (1986), Hardle and Mammen (1993)] method to calculate the 
critical values for test Tqlr- Let r be a random variable with a distribu- 
tion function F{-) such that Et = 0, Et'^ = 1 and -E|Tp < oo. We generate 
the bootstrap residual e* = iin, where n is independent of ii. Define boot- 
strap version Iq^r like Tqlr based on the bootstrap sample (y/, Xj, Zj, Ui), 
where Y* = 0Zj + Q;(C/j)Xj + e| for z = 1, 2, . . . , n. On a basis of the distri- 
bution of Tq^^, we have the (1 — a) quantile tl_^ and reject the parametric 
hypothesis if Tglr > tl_^. 

4. Numerical examples. 

4.1. Performance of the proposed estimators. In this section, we con- 
ducted simulation experiments to illustrate the finite sample performances 
of the proposed estimators and tests. Our simulated data were generated 
from the following model: 



Wi and W2 are bivariate normal with marginal mean zero, marginal variance 
1 and correlation l/\/5, while Xi and X2 are independent and normal with 
mean zero and variance 0.8. The unobserved covariate is related to auxil- 
iary variable (r/, V) through Ci^) = — 2cos(47ry) and r] = ^(F) + e. V is 



(4.1) I 



Y = piC + (i2Wi + I3^W2 + ai{U)Xi + a2{U)X2 + e. 
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a uniform random variable on [0, 1] and f7 is a uniform random variable on 
[0, 3]. The errors e and e are independent of each other and normal variables 
with mean and variances cx^ and Cg, respectively. The varying-coefficient 
functions are 

(4.2) ai(C/) = exp(-C/2) + sin(7rC/) or 

(4.3) ai{U, g) = m + g{ai{U) — m}, 

(4.4) a2{U) = - cos(27rC/), 

where m = ai{t) dt/3, and g is chosen one from the set {0.0, 0.2, 0.5, 0.7, 1.0}. 

The sample size was 100. We generated 500 data sets in each case, apply- 
ing to each simulated sample the bootstrap test proposed for the parametric 
part based on 500 bootstrap repetitions. The Gaussian kernel has been used 
in this example. The optimal bandwidth h was chosen by the leave one out 
cross-validation method described in Section 2.4 and the bandwidth b was 
selected as 6 = cj„n~^/'^, where is the sample deviation of V. 

We consider four scenarios. In the first three scenarios cr^ = 1 and a'^ = 2. 

(i) /3 = (0,c- 1,1)T for cG {0,0.1,0.2,0.25,0.5,0.7,1.0} and ai('u) and 
a2{u) are given in (4.2) and (4.4); 

(ii) (3 = (0,-0.8,1)"'" and ai{u) and a2{u) are given in (4.3) and (4.4) 
with £»E {0.0,0.2,0.5,0.7,1.0}; 

(iii) P= (0.2,-1,1)"^ and ai{u) and a2{u) are the same as in (ii); 

(iv) The setting is the same as that of (iii). But the signal-noise ratio 
(r = aj/{a1 + a"^)) varies from 0.3 to 0.8 by 0.1. 

The corresponding results are presented in Tables 1-4, in which we display 
the estimated values and associated standard errors, standard derivations, 
and coverage probabilities based on the benchmark estimator (i.e., all co- 
variates measured exactly), the proposed estimator and the naive estimator 
(r/j directly used as the covariates). We summarize our findings as follows: 

When Pi = [scenario (i) and (ii)], all estimates are close to the true 
values regardless of the nonparametric functions ai{u) and a2{u). The dif- 
ferences among the estimated values based on three methods are slight and 
can be ignored. However, when (3i = 0.2, the estimates of Pi based on the 
naive method have severe biases and the associated coverage probabilities 
are also substantially smaller than 0.95. These biases were not improved 
when the sample size was increased (not listed here) . But the proposed esti- 
mation procedure performs well. On the other hand, the estimates of P2 and 
P3 are similar based on the three methods. From Table 4, we can see that 
the naive estimator of Pi has zero coverage probabilities when r = 0.3, while 
the proposed estimator has fairly reasonable coverage probabilities. With an 
increase of r, it is readily seen that coverage probabilities of the proposed es- 
timator are closer to the nominal level, which indicates the proposed method 
is promising. 



Table 1 

Results of simulation study for scenario (i) 
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Note: 


"Est" is 


the simulation mean; 


"SE" 


is the mean 


of the estimated standard error: 


"SD" 


is the mean 


of the estimated standard 



deviation; and "GOV" is the coverage probability of a nominal 95% confidence interval. The methods used are "B" for the benchmark 
method, "P" for the proposed method, and "N" for the naive method. 
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Table 2 

Results of simulation study for scenario (ii) 
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Table 3 

Results of simulation study for scenario (iii) 
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Table 4 

Results of simulation study for scenario (iv) 
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0.127 


0.141 


0.950 




N 


0.116 


0.033 


0.030 


0.200 


-0, 


.988 


0.137 


0.133 


0.930 


1.020 


0.132 


0.135 


0.930 


0.60 


B 


0.194 


0.035 


0.040 


0.970 


-0, 


.993 


0.136 


0.141 


0.950 


1.025 


0.137 


0.138 


0.930 




P 


0.192 


0.038 


0.040 


0.950 


-0, 


.994 


0.140 


0.140 


0.950 


1.025 


0.138 


0.137 


0.910 




N 


0.131 


0.028 


0.032 


0.450 


-0, 


.998 


0.152 


0.137 


0.910 


1.020 


0.138 


0.134 


0.940 


0.70 


B 


0.198 


0.038 


0.039 


0.960 


-1, 


.018 


0.137 


0.133 


0.970 


1.004 


0.140 


0.131 


0.930 




P 


0.194 


0.040 


0.038 


0.950 


-1, 


.017 


0.138 


0.132 


0.960 


1.004 


0.142 


0.131 


0.930 




N 


0.152 


0.038 


0.033 


0.660 


-1, 


,021 


0.142 


0.130 


0.920 


1.004 


0.144 


0.128 


0.920 


0.80 


B 


0.203 


0.036 


0.038 


0.950 


-1, 


.001 


0.142 


0.132 


0.930 


1.005 


0.136 


0.132 


0.960 




P 


0.203 


0.038 


0.038 


0.950 


-1, 


.002 


0.143 


0.131 


0.940 


1.005 


0.135 


0.132 


0.960 




N 


0.172 


0.035 


0.035 


0.870 


-1, 


.002 


0.147 


0.131 


0.920 


1.000 


0.136 


0.131 


0.930 



O 

> 
O 



o 
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4.2. Performance of the proposed tests. We now explore the numerical 
performance of the proposed tests. First, we want to test a hypothesis of the 
parametric component of form: 

(4.5) Ho:A(3 = VS Hi:A(3 = c, 

where ^ = (1,1,1)'^, c is a value from the set {0, 0.1, 0.2, . . . , 0.7, 1}, /3 = 
(0.2, c — 1.2,1)"'" and ai{-) and ai(-) are the same as those in scenario (i). 
The same models and error distribution as in Section 4.1 are used. 

The power to detect Hi was calculated by using the critical values from 
the chi-squared approximation and the wild bootstrap approximation. To 
compare test performances, the powers of the tests based on the benchmark 
estimator, the proposed estimator and the naive estimator are presented. 
In implementing the wild bootstrap method, we generated 500 bootstrap 
samples from the model 

Y* = Mi + p2Wu + /33t^2i + ai{Ui)Zu + a2(f/.)^2* + 4, 

where, e* is a wild bootstrap residual; that is, e* = Tiii, with ii = Yi — {(3i^i + 
P2Wu + $3W2i + aiiUi)Zu + a2iUi)Z2i}, n = -{V5-l)/2 with probability 
(\/5 + l)/(2\/5) and n = (\/5 + l)/2 with 1 - (^/5 + l)/(2\/5). Using this 
bootstrap sample {Y* ,S^i,Wi,Zi,Ui), we can calculate the T* and W*, and 
get the 95 percentiles as the critical values for the proposed tests at the 
significance level 0.05. 

The power of Tn associated to scenario (iii) is presented in Table 5 for 
/?! = 0.2. Note that the power is actually the empirical level when c = 0. All 
empirical levels close nominal level 0.05 and the empirical level based on 
the wild bootstrap procedure are consistently smaller than those based on 
the X'^ approximation and are closer to the nominal level. These facts apply 
for Pi = (not listed here). As c increases to 0.7, the powers of two tests 
based on approximation are greater than 0.92. Similar conclusions can 
be drawn for the Wald test, whose simulation results are also given in Table 
5. 

We further study the numerical performance of the test by checking the 
nonparametric component. We consider the following hypothesis: 

(4.6) Ho:ai{u)=m VS ai(u) = ai(u, ^) given by (4.3). 

The simulation results obtained by using the wild bootstrap approximation 
method to choose critical value are shown in Table 6. When g = 0, the results 
are the empirical levels, which are close to the nominal level. The power is 
greater than 0.99 when g = 0.5. Table 6 also indicates that the power is a 
monotone increasing function of g. 
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Table 5 

Empirical power of profile least-square ratio test r„ and the Wald test Wn at level 0.05 
for hypothesis (4-5) ■ Data were generated from models (4-1) with (3 = (0.2, c — 1.2, 1)^ 
and c £ {0, 0.1, 0.2, 0.25, 0.5, 0.7, 1} and ai{u) and a2{u) given by (4-2) and (4-4)j 
respectively. The methods used are "Asm" for the asymptotic version, and "Boot" for the 

bootstrap version 









T„ 






Wald 




c 




B 


P 


N 


B 


P 


N 





Aym 


0.060 


0.070 


0.080 


0.050 


0.050 


0.080 




Boot 


0.050 


0.060 


0.060 


0.060 


0.060 


0.060 


0.10 


Aym 


0.150 


0.140 


0.150 


0.130 


0.130 


0.150 




Boot 


0.130 


0.100 


0.080 


0.130 


0.120 


0.080 


0.20 


Aym 


0.190 


0.220 


0.120 


0.150 


0.150 


0.120 




Boot 


0.170 


0.160 


0.080 


0.190 


0.180 


0.080 


0.25 


Aym 


0.350 


0.340 


0.240 


0.320 


0.310 


0.240 




Boot 


0.290 


0.280 


0.180 


0.310 


0.300 


0.180 


0.50 


Aym 


0.740 


0.710 


0.530 


0.670 


0.660 


0.530 




Boot 


0.700 


0.630 


0.500 


0.720 


0.630 


0.500 


0.70 


Aym 


0.940 


0.940 


0.870 


0.930 


0.920 


0.870 




Boot 


0.920 


0.890 


0.800 


0.930 


0.890 


0.800 


1.00 


Aym 


1.000 


1.000 


1.000 


0.990 


0.990 


1.000 




Boot 


0.990 


0.990 


0.960 


0.990 


0.990 


0.960 








Table 6 










Empirical power 


of level 0.05 for hypothesis (4.6) using the wild 






bootstrap procedure. Data were 


generated from 


(4.1) and (4.3) with 








13 = {0.2,- 


-1,1)"^ and £iG {0,0.5,0.10,0.15,0.5,0.7} 








Q 




B 


P 




N 











0.060 


0.050 




0.080 






0.05 




0.110 


0.140 




0.160 






0.10 




0.240 


0.260 




0.250 






0.15 




0.410 


0.360 




0.360 






0.20 




0.520 


0.510 




0.500 






0.50 




0.990 


0.990 




1.000 






0.70 




1.000 


1.000 




1.000 





4.3. Real data example. To illustrate the proposed estimation method, 
we consider a dataset from a Duchenne Muscular Dystrophy (DMD) study. 
See Andrews and Herzberg (1985) for a detailed discussion on the dataset. 
The dataset contains 209 observations corresponding to blood samples on 
192 patients (17 patients have two samples) collected from a project to 
develop a screening program for female relatives of boys with DMD. The 
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0.2 




01 02 03 04 05 Oe 0.7 0-8 0.9 1 



Fig. 1. Estimated curves of the nonparametric function for the DMD study. The solid, 
dotted lines were obtained using the naive and proposed methods, respectively. 

program's goal was to inform a woman of her chances of being a carrier 
based on serum markers as well as her family pedigree. Another question of 
interest is whether age should be taken into account in the analysis. Enzyme 
levels were measured in known carriers (75 samples) and in a group of non- 
carriers (134 samples). The serum marker creatine kinase (ck) is inexpensive 
to obtain, while the marker lactate dehydrogenase (Id) is very expensive to 
obtain. It is of interest to predict the value Id by using the level of ck, carrier 
status and age of patient. 

We consider the following model: Y = Pq + PiZi + /32-^2 + g{U), where 
Zi = ck is measured with errors and Z2 = carrier status is exactly measured, 
U is age and Y denotes the observed level of lactate dehydrogenase. We 
justify the measurement error of Zi by regressing Zi on U. The estimates 
and associated standard errors based on the naive and proposed methods 

are as follows: ^o.naive = 4.6057(0.113), /3i,naive = 0.1509(0.027) and ^2,naive = 

0.2269(0.055); ^o,n = 4.4296(0.329), = 0.1775(0.042) and ^2,n = 

0.3702(0.050). The estimated curves of the nonparametric function g{u) are 
provided in Figure 1. Accounting for measurement errors, the estimate of 
Pi increases about 17.2%, and the associated standard error also increases 
55%. The estimate of P2 also increases when measurement errors are taken 
into account. The patterns of the nonparametric curve are similar, and show 
a slight difference. 



22 



Y. ZHOU AND H. LIANG 



5. Discussion. We developed estimation and inference procedures for the 
SVCPLM when parts of the parametric components are unobserved. The 
procedures are derived by incorporating ancillary information to calibrate 
the mismeasured variables and by applying the profile least-square-based 
principle. 

In some cases we may not have an auxiliary variable rj, but we can observe 
two or more independent replicates of V. For instance, when two measure- 
ments Vi and V2 , which satisfy that Vi = $ + ui and V2 = $ + U2, and 
E{ui\'V2) = and E{u2\^i) = 0, are available, we can estimate ^ by 

. _ T.l=i{^iiKh{^i2 -v) + Vai^fe(V»i - v)} 
^^'"^ Y.U{Kh{^^2-v)+K^,{Y,^-v)} ' 

because £'(Vi|V2 = v) = £'(V2|Vi = v) = £'(^|V = v). The proposed proce- 
dure applies to this situation as well, and similar results to those presented 
in this paper can be obtained for the resulting estimator. 

It is of interest to extend the proposed methodology to a more general 
semiparametric model: £^(y|Z,X,C/) = G{e^Z + q;'^([/)X}, where G(-) is 
a link function. The study of this model with mismeasured components of 
Z needs further investigation and is beyond the scope of this paper. 

APPENDIX 

In this Appendix, we list assumptions and outline proofs of the main 
results. The following technical assumptions are imposed: 

A.l. Assumptions. 

1. The random variable U has a bounded support U. Its density function 
fu{') is Lipschitz continuous and bounded away from on its support. 
The density function of random variable V, fv{v), is continuously differ- 
entiable and bounded away from and infinite on its finite support V. 
{ai{u),i = 1,2, . . . ,q} have a continuous second derivative. 

2. The qx q matrix E{Z7i^\U) is nonsingular for each U £U. All elements 
of the matrices E{ZZ^\U), E{ZZ^\U)-^ and ^(ZX'^|[/) are Lipschitz 
continuous. 

3. The kernel functions K[-) and L(-) are density functions with compact 
support [—1,1]. 

4. There is an s > 2 such that -E||Z|p* < 00 and -E||X|p'^ < cxd and for some 

5 <2- s-^ such that n^^'^h 00, n'^^-Hk ^ 00 and nhb'j^''^'^^ 0, 
k = 1,2, . . . ,pi, where bk is the bandwidth parameter in the polynomial 
estimator Ck{') of Cfe(")- 

5. n/i^ —> and ?i/i^/(logn)^ —> 00. 
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A.2. Preliminary lemmas. Write Cni = O^f^ + /l^ Cn2 = C-^Y'"^ + 

Cn = Cnl+Cn2- 

Lemma A.l. Suppose that (Zj, Xj, ?7j),i = 1, 2, . . . ,n are an i.i.d. ran- 
dom vector. E\g{J^, Z, {7)| < oo and E[g{-, •, = u] have a continuous sec- 
ond derivative on u. Further assume that i?(|5(X, Z, C/)|*|Z = z,X = x) < 
oo. Let K he a hounded positive function with a hounded support satisfying 
the Lipschitz condition. Given that n'^^~^h oo for some 5 <1 — s~^ , then 
we have 



sup 



n ~ 



i=l 



h 



giX„Z„ U,) - f{u)E{g{X, Z,u)\U = u}fi, 



= 0(c„i) a.s. 

Furthermore, assume that E[ei\7ii,Ji.i,Ui] = 0, i?[|ej|'^|Zj, Xj, t/j)] < oo, then 
1 



sup 

new 



^KhiUi- u)g(Xi,Zi,Ui)ei 



n " , 

1=1 



0{a 



Til 



a.s. 



Proof. The first result follows an argument similar to that of Lemma 
A. 2 of Fan and Huang (2005). The second result follows the first result and 
an argument similar to Xia and Li (1999). □ 

Lemma A. 2. Suppose that E[g{7j,^,u)\U = u] has a continuous second 
derivative on u and E\g{^,7j,U)Y < oo. Under Assumptions 1-5, we have 



sup 



1 



n 



u 



1=1 



h 



f{u)E{g{X,Z,u)e\U = u}fik 



and 



sup 

new 



n 



Y,Kh{U,-u)gO^,,Z,,U,)h{k,)ei 

i=l 



0{Cn) 



a.s. 



0(c„), 



where h{-) is a twice continuous differentiahle function. 



Proof. Note that i Er=i Kh{U^ - u){^fg{^i,Z,, Ui)l, can be de- 
composed as 



1 



n 1- , 

1=1 



h 



n 



g{^,,Zi,Ui)iJ 



h 



g{Xi,Zi,Ui){i,-i,y. 
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By Lemma A.l, the first term equals fu{u)E{g(X., Z, u)^\U = u}^k + 0(c„i) 
uniformly on u € Z// in probability. Recalling the asymptotic expression given 
in (2.1) and using Lemma A.l, one can show that the second term is 0(c„2)- 
This completes the proof of Lemma 2. □ 

Lemma A. 3. ^(•, •, u) has a continuous second derivative on u and E\g{^, 
'Zi,U)\ <oo. Under Assumptions 1-5, n~^Yl^=i{^i~'^i)^\9{'^i^'^i^Ui) is of 
order 0{cn) a.s., where / = 0, 1. 

Proof. The proof follows from (2.1) and arguments similar to Lemma 
A.2. □ 

Lemma A. 4. Under Assumptions 1-5, we have 

(z'^z)-izT(i-s)z 

E^(z.,x.,^.)[{^(-^^)(v.)f,o] 

1 " " 1 \ 
+ - E E 77^^(2^'^- U,)L,iV, - V,)(eJ, 0) {1 + 0(1)} 

in probability. 

Proof. We first prove that 

(A.l) -z^z^i:. 

n 

A direct calculation yields 

(A.2) D^WuDu = nU{U)T{U) ® f ^ + Op(cni)}- 

On the other hand, Lemma A.S implies 

(A.S) DlWut = nU{U)^U) ® (l,m)^{l + Op(cn)}. 

A combination of (A.2) and (A.S) implies 

(A.4) {^^MDlWuDur^DlWu'L = y^^T~\U)^U){l + 0^{cn)} 
and then 

(A.S) Z, = Z^-$T(C/,)^-Hf/^)X,{l + Op(c„)}, i = l,2,...,n. 

It follows from these arguments that n'^Z^Z = i YA=i{^i7'i,^h Ui)]®'^{l + 
Op(cn)}i and (A.l) follows. 
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Note that ZT(I - S)(Z - Z) = - 8)^(1 - S)(Z - Z) - (Z - Z)T(I - 

S)'^(I — S)(Z — Z) =^ Ji — J2. The second term, J2, is Op(c,^) by Lemma 
A.3. Write Z* = (I - S)Z. We have Ji = Zj(Z - Z) - ZjS(Z - Z). It follows 
from (2.1) that 

DuWu{7. - Z) 







I ri n / 1 

+ E E fv\yi)Kh{U^ - U)Lb{Vj - Fi)X,eT 65 ( f/^, - f/ 



i=ij=i \ /i 

+ 0(6^+1 + log 6" V\/n6). 
By an argument similar to that of (A. 5), we derive 
ZjS(Z-Z) 

1 " 



1 

+ - E fv\yi)Kh{U, - Ui)Lb{Vj - Vi)X,e 



T 



1 0^ 

h ' 



x{l + op(l)}, 
where p(Z;,X;,[/i) can be expressed as 

V^(Zz,Xz, [//){! + Op(c„)}(x;^,0) 

x|/„([/,)r([/o® (^^^ j;^){i + Op(c„)}| 



V;(Z;,X;,[/OXT 



' r^l([/0®(;U2,-m){l + Op(Cn)}. 
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Denote by p{'Li ,^i,Ui) the main term of the right-hand side of the above for- 
mula. Note that £^{p(Zi, X;, C//)|J7/} = 0. By Lemma 3 of Chen, Choi and Zhou 
(2005) we have 

I n n n Xe'? 

Z3 E E E ^/^(^^ - Ui)Hv, - Vi)p{Zi,Xi,Ui)-^ 
^ i=ij=ii=i j^y^i) 

(A.6) 

= 0p(c„n-i/2). 

Furthermore, we can show in a similar way as that for (A.6), that 

2,\iv EE^^(^»-^Op(z^x,,[/ox.u(-+^)(i^Or = Op(c^)- 

These arguments imply that 

(A.7) n-izJS(Z-Z) = Op(c2). 

We now deal with the term Z'^{Z-Z). Note that Zj(Z-Z) equals X;r=i V'(Zi, X^, 
Ui){{^i — ^i)'^,0}, which can be further decomposed as 

Ci^. c,b .^^(z^^x„c/,)[{^(^+i)(y.)}^o] 



(r + l)! 



i=l 
n n 



-j 7t a -1 

This completes the proof of Lemma A. 4. □ 

Lemma A. 5. Under Assumptions 1-5, we haveZ^ {\ — S){\ — S)'^Z/n ^ 
S in probability and ± = n{Z^Z)-^Z'^ {I - S)(I - S)Z(Z'^Z'^)-i ^ 

Proof. The proof of the first result can be finished by arguments sim- 
ilar to those of Lemmas A.2-A.4, while the second one can be proved by 
arguments similar to Lemma 7.3 of Fan and Huang (2005). □ 

Lemma A.6. Under Assumptions 1-5, we have Z'^ {I — S)'M./n = Op{c^). 

Proof. The proof follows (A. 5) and an argument similar to that of 
Lemma 7.4 of Fan and Huang (2005). □ 

Lemma A. 7. g{-) and h{-) are two continuous function vectors. Under 
Assumptions 1-5, we have X]iLi(Zi — Zi)g[Zi)ei — > and J27=i{'^i ~ 
Zi)X.J h{Ui)£i -^0 in probability. 
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Proof. The proof follows from arguments similar to those of Lemma 
A.2. □ 

Lemma A. 8. Under Assumptions 1-5, we have 

n 

ZT(I - S)£ = ^ Ui)Xi{l + op(l)}ei + oin^/^), 



i=l 



where e = (ei, . . . , 



Proof. Note that zT(I-S)£ = ELi M^i - (X„0)(2?„,W^„,D«J-1D„, x 
WuiS}. By the same argument as those for (A. 3), we have 



i=l 



This formula along with (A. 2) yields 

{^^MDlWuDur^DuWue = y.^T-\U)E{y.\U)Op{cn). 

A combination of these arguments with Lemma A. 7 finishes the proof of 
Lemma A. 8. □ 

Proof of Theorem 1. Note that 0„ can be expressed as {Z^Z)-^7l^ {1- 
S)Z0 + (ZTZ)-IZT(I - S)M + (ZTZ)-^ZT(I - S)£. By Lemma A.8, the 
third term equals T.^^n^^ YJLi ^(Zi,Xi, Ui)ei{l + op(l)} + op(n-V2). r^j^g 
first term equals, via Lemma A. 4, 

/" Q — 1 'LT -\-\ ^ 

nir + 1 ! 

n n -, 



i=lj- 



By Lemma A. 6 and (A.l), it follows that the second term of ©n's expression 
is of order O(c^) in probability. These arguments imply that 



@n-@0 



CiSu Cpb 



1^ A^+l n 



nir + 



-— E^(Z.,X,,C/,){€(^+i)(Fi)r/3o 
''■ i=i 



+ ;;2 E E T7T7T^(Z- ^*)^''(^^- - ^')S^^o 



1 " 

-EV'(Zi,Xi,?70ei 



{l + op(l)}. 
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This completes the proof of Theorem 1. □ 



Proof of Theorem 3. By the definition of *(n), we have 

H* = {D^WuDur^D^WuiY - Z0„) 

= h + {D^WuDur^D^WuiZ - Z)0 

+ {D^WuDur^D^WuZ{@ - 0„) + Rn, 

where h = {D^WuD^)-' D^WuiY -Ze) and i?„ = {D^WuDu)-' DuWu{Z- 
Z)(0 — ©n)- It is easy to show that i?„ = o(n~^/^) in probabihty. Note that 

D^Wu{Z-Z)@ = |^^^|^^Ei^.(t/..--) 



n n 



+ -J2J2fv\^)KtM - u)U{V, - Vi) 



'^1-1 

1=1 ]=1 



It follows from (A. 2) that 



(A.8) =^fi^W-M«(' 



and 



£;(x,|y = vs)eJ/3 

\ h-\Ui - u)E{Xi\V = Vj)eJ f3 



{l + op(l)} 
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r-i(n) 



■Y,Kh{Ui-u)E{Xi\V = Vi)eJp 



{l + Op(l)}. 



/i2 - fJ'iiUi -u)/h 
{Ui - u)/h - III 



Furthermore, (A. 3) implies that 

{DlWuD^r^DlWu'Li® - @r 



fu{u)T{u) ® 



1 /ii 



(A.9) 



X {nU{u)^{u) ® (l,/xi)^}(0 - e„){l + op(l)} 
= {T-\u)^u) ® (1, Or}(0 - 0„){1 + op(l)}. 

We therefore have h = {DlWuDuT^DlWuM.u + {DlWuDuY^DlWue, where 
M„ = q(m)TX. 

By the Taylor expansion and a direct simplification, we have 

/y^Jcxiu) + (C/i -ii)X7Q;'(n) + 2-i(C/i -'u)2x7a"(n) 
M= : \+o{K' 

\y.lcx{u) + (?7„ - n)XTQ'(n) + 2-i([/„ - uflLlcx"{u) 

/{Ui-ufXja"{u)' 



ha'{u) 



2 



+ o{h'). 



Hence, 



(A.IO) 



2 

1 nT 



(?7i -n)2x7a"(n) 



+ (L»„^iy„L»„)-^L>„^M^„£ {1 + Op{h')}. 



XUn-ufy.lcx"{u), 

It follows from (A.8)-(A.10) that \/n/iH{^'(no) — ^'(no)} can be represented 



as 



n(r:'i)iJS;:(„) ^-''^>g^'^-"' 

{([/,- n)//i-/ii}X,{^(^+^)(l^0r/3o 
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2(^2 



{(c/i-ti)//i-/_ii}XiX/ y ' 



\/nhV ^(u) , 
nfu{u){n2-l^i) ^ 



xS(Xi|F = F,)eF/3' 

X{l+0p(l)}. 

By an argument similar to that of Lemma A. 8, we have 
{D^WuDur^DZ^WuB 



/i2 - /Wl(f/i - 

(C/i - u)/h - m 



M2 - u)/h 

{Ui -u)/h- fii 

The proof of Theorem 3 is completed. □ 



{l + op(l)}. 



Proof of Theorem 5. The proof is similar to Theorems 3.1 and 
3.2 of Fan and Huang (2005). We only give a sketch. We first prove that 
n-^RSSi = a^{l + op{l)}. 

By a procedure similar to that of Theorem 3.2 in Fan and Huang (2005), 

we can obtain that n-'^RSSio = n"^ ELiC^i - - ®^ "Lif = ^^{l + op(l)}, 
where Mjo is the iih. element of Mq = S(Y — Z0). A direct calculation yields 
that 

n^^{RSSi - ESS lo) 

n 



= n-i^e (Z, - Zi){{Yi - M, - e Zi) + (Y, - Mio - e Zi)} 

i=l 

(A.11) 

+ n-i5^(Mi-Mio){(yi-M,-0 Zi) 



+ {Y, - Mio - 0^Zi)}. 
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By (2.1), Theorem 2 and the Jensen inequality, we know that the first term 
in the right-hand side of (A. 11) is bounded by 



- T - 

max Zj — Zj 

l<i<n 



(A.12) 




1 ^''^ 



+ max {\Mi - Mio\ + G)^|Zi - Zi\} 

l<i<n 



which is op(l). A similar argument can show that the second term in the 
right-hand side of (A. 11) is also op(l). We therefore have n^^RSSi = a'^{l + 
op{l)}- 

Furthermore, RSSq can be decomposed as {Y - M - Z0 Z (0 - 00 ) {Y - 
M - Z0 + Z(0 - 0o)} = RSSi + Q1 + Q2 + Q3, where Qi = {Z(0 - 
0o)r{Z(0 - 0o)}, Q2 = (Y - M - Z0){Z(0 - ©o)} and Q3 = {Z(0 - 
0o)}T(Y-M-Z0). 

Recalling the expression of 0o and the result given in (A.l), we know that 
n-^Z^Z ^ E in probability, and Qi - n0'^A'r{AS~^ A'^}-^ A0 ^ cj^ x 
J2i=i ^iXii ™ distribution. In an analogous way, we can show that Q2 and 
Q3 are asymptotic negligible in probability. These statements, along with 
the Slutsky theorem, imply that 2r„ - na^'^Q'^A'^{A'S''^A^}~^A@ 
J2i=i ^iXii ill distribution. Finally, following the lines of Rao and Scott (1981), 
we can prove that the distribution of Qn J2i=i ^ixfi has the same approxi- 
mate distribution as xf^ complete the proof of Theorem 5. □ 
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